The search functionality is under construction.

Keyword Search Result

[Keyword] neural network(855hit)

261-280hit(855hit)

  • A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

    Hang CUI  Shoichi HIRASAWA  Hiroaki KOBAYASHI  Hiroyuki TAKIZAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/13
      Vol:
    E101-D No:9
      Page(s):
    2307-2314

    Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. Because of the importance, many different implementations have been proposed to accelerate this computational kernel. The performance characteristics of those SpMV implementations are quite different, and it is basically difficult to select the implementation that has the best performance for a given sparse matrix without performance profiling. One existing approach to the SpMV best-code selection problem is by using manually-predefined features and a machine learning model for the selection. However, it is generally hard to manually define features that can perfectly express the characteristics of the original sparse matrix necessary for the code selection. Besides, some information loss would happen by using this approach. This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix. Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best performance, in advance of the execution. The benefits of using the proposed mechanism are discussed by calculating the prediction accuracy and the performance. According to the evaluation, the proposed mechanism can select an optimal or suboptimal implementation for an unseen sparse matrix in the test data set in most cases. These results demonstrate that, by using deep learning, a whole sparse matrix can be used to do the best implementation prediction, and the prediction accuracy achieved by the proposed mechanism is higher than that of using predefined features.

  • Multi-Channels LSTM Networks for Fence Activity Classification

    Kelu HU  Chunlei ZHENG  Wei HE  Xinghe BAO  Yingguan WANG  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2018/04/23
      Vol:
    E101-D No:8
      Page(s):
    2173-2177

    We propose a novel neural networks model based on LSTM which is used to solve the task of classifying inertial sensor data attached to a fence with the goal of detecting security relevant incidents. To evaluate it we deployed an experimental fence surveillance system. By comparing experimental data of different approaches we find out that the neural network outperforms the baseline approach.

  • Predicting Taxi Destination by Regularized RNN with SDZ

    Lei ZHANG  Guoxing ZHANG  Zhizheng LIANG  Qingfu FAN  Yadong LI  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2018/05/02
      Vol:
    E101-D No:8
      Page(s):
    2141-2144

    The traditional Markov prediction methods of the taxi destination rely only on the previous 2 to 3 GPS points. They negelect long-term dependencies within a taxi trajectory. We adopt a Recurrent Neural Network (RNN) to explore the long-term dependencies to predict the taxi destination as the multiple hidden layers of RNN can store these dependencies. However, the hidden layers of RNN are very sensitive to small perturbations to reduce the prediction accuracy when the amount of taxi trajectories is increasing. In order to improve the prediction accuracy of taxi destination and reduce the training time, we embed suprisal-driven zoneout (SDZ) to RNN, hence a taxi destination prediction method by regularized RNN with SDZ (TDPRS). SDZ can not only improve the robustness of TDPRS, but also reduce the training time by adopting partial update of parameters instead of a full update. Experiments with a Porto taxi trajectory data show that TDPRS improves the prediction accuracy by 12% compared to RNN prediction method in literature[4]. At the same time, the prediction time is reduced by 7%.

  • Transform Electric Power Curve into Dynamometer Diagram Image Using Deep Recurrent Neural Network

    Junfeng SHI  Wenming MA  Peng SONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/09
      Vol:
    E101-D No:8
      Page(s):
    2154-2158

    To learn the working situation of rod-pumped wells under ground, we always need to analyze dynamometer diagrams, which are generated by the load sensor and displacement sensor. Rod-pumped wells are usually located in the places with extreme weather, and these sensors are installed on some special oil equipments in the open air. As time goes by, sensors are prone to generating unstable and incorrect data. Unfortunately, load sensors are too expensive to frequently reinstall. Therefore, the resulting dynamometer diagrams sometimes cannot make an accurate diagnosis. Instead, as an absolutely necessary equipment of the rod-pumped well, the electric motor has much longer life and cannot be easily impacted by the weather. The electric power curve during a swabbing period can also reflect the working situation under ground, but is much harder to explain than the dynamometer diagram. This letter presented a novel deep learning architecture, which can transform the electric power curve into the dimensionless dynamometer diagram image. We conduct our experiments on a real-world dataset, and the results show that our method can get an impressive transformation accuracy.

  • Design and Implementation of Deep Neural Network for Edge Computing

    Junyang ZHANG  Yang GUO  Xiao HU  Rongzhen LI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/05/02
      Vol:
    E101-D No:8
      Page(s):
    1982-1996

    In recent years, deep learning based image recognition, speech recognition, text translation and other related applications have brought great convenience to people's lives. With the advent of the era of internet of everything, how to run a computationally intensive deep learning algorithm on a limited resources edge device is a major challenge. For an edge oriented computing vector processor, combined with a specific neural network model, a new data layout method for putting the input feature maps in DDR, rearrangement of the convolutional kernel parameters in the nuclear memory bank is proposed. Aiming at the difficulty of parallelism of two-dimensional matrix convolution, a method of parallelizing the matrix convolution calculation in the third dimension is proposed, by setting the vector register with zero as the initial value of the max pooling to fuse the rectified linear unit (ReLU) activation function and pooling operations to reduce the repeated access to intermediate data. On the basis of single core implementation, a multi-core implementation scheme of Inception structure is proposed. Finally, based on the proposed vectorization method, we realize five kinds of neural network models, namely, AlexNet, VGG16, VGG19, GoogLeNet, ResNet18, and performance statistics and analysis based on CPU, gtx1080TI and FT2000 are presented. Experimental results show that the vector processor has better computing advantages than CPU and GPU, and can calculate large-scale neural network model in real time.

  • Efficient Mini-Batch Training on Memristor Neural Network Integrating Gradient Calculation and Weight Update

    Satoshi YAMAMORI  Masayuki HIROMOTO  Takashi SATO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E101-A No:7
      Page(s):
    1092-1100

    We propose an efficient training method for memristor neural networks. The proposed method is suitable for the mini-batch-based training, which is a common technique for various neural networks. By integrating the two processes of gradient calculation in the backpropagation algorithm and weight update in the write operation to the memristors, the proposed method accelerates the training process and also eliminates the external computing resources required in the existing method, such as multipliers and memories. Through numerical experiments, we demonstrated that the proposed method achieves twice faster convergence of the training process than the existing method, while retaining the same level of the accuracy for the classification results.

  • Submodular Based Unsupervised Data Selection

    Aiying ZHANG  Chongjia NI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2018/03/14
      Vol:
    E101-D No:6
      Page(s):
    1591-1604

    Automatic speech recognition (ASR) and keyword search (KWS) have more and more found their way into our everyday lives, and their successes could boil down lots of factors. In these factors, large scale of speech data used for acoustic modeling is the key factor. However, it is difficult and time-consuming to acquire large scale of transcribed speech data for some languages, especially for low-resource languages. Thus, at low-resource condition, it becomes important with which transcribed data for acoustic modeling for improving the performance of ASR and KWS. In view of using acoustic data for acoustic modeling, there are two different ways. One is using the target language data, and another is using large scale of other source languages data for cross-lingual transfer. In this paper, we propose some approaches for efficient selecting acoustic data for acoustic modeling. For target language data, a submodular based unsupervised data selection approach is proposed. The submodular based unsupervised data selection could select more informative and representative utterances for manual transcription for acoustic modeling. For other source languages data, the high misclassified as target language based submodular multilingual data selection approach and knowledge based group multilingual data selection approach are proposed. When using selected multilingual data for multilingual deep neural network training for cross-lingual transfer, it could improve the performance of ASR and KWS of target language. When comparing our proposed multilingual data selection approach with language identification based multilingual data selection approach, our proposed approach also obtains better effect. In this paper, we also analyze and compare the language factor and the acoustic factor influence on the performance of ASR and KWS. The influence of different scale of target language data on the performance of ASR and KWS at mono-lingual condition and cross-lingual condition are also compared and analyzed, and some significant conclusions can be concluded.

  • Image-Based Food Calorie Estimation Using Recipe Information

    Takumi EGE  Keiji YANAI  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1333-1341

    Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.

  • Object Specific Deep Feature for Face Detection

    Xianxu HOU  Jiasong ZHU  Ke SUN  Linlin SHEN  Guoping QIU  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1270-1277

    Motivated by the observation that certain convolutional channels of a Convolutional Neural Network (CNN) exhibit object specific responses, we seek to discover and exploit the convolutional channels of a CNN in which neurons are activated by the presence of specific objects in the input image. A method for explicitly fine-tuning a pre-trained CNN to induce object specific channel (OSC) and systematically identifying it for the human faces has been developed. In this paper, we introduce a multi-scale approach to constructing robust face heatmaps based on OSC features for rapidly filtering out non-face regions thus significantly improving search efficiency for face detection. We show that multi-scale OSC can be used to develop simple and compact face detectors in unconstrained settings with state of the art performance.

  • Simultaneous Object Segmentation and Recognition by Merging CNN Outputs from Uniformly Distributed Multiple Viewpoints

    Yoshikatsu NAKAJIMA  Hideo SAITO  

     
    PAPER-Machine Vision and its Applications

      Pubricized:
    2018/02/16
      Vol:
    E101-D No:5
      Page(s):
    1308-1316

    We propose a novel object recognition system that is able to (i) work in real-time while reconstructing segmented 3D maps and simultaneously recognize objects in a scene, (ii) manage various kinds of objects, including those with smooth surfaces and those with a large number of categories, utilizing a CNN for feature extraction, and (iii) maintain high accuracy no matter how the camera moves by distributing the viewpoints for each object uniformly and aggregating recognition results from each distributed viewpoint as the same weight. Through experiments, the advantages of our system with respect to current state-of-the-art object recognition approaches are demonstrated on the UW RGB-D Dataset and Scenes and on our own scenes prepared to verify the effectiveness of the Viewpoint-Class-based approach.

  • Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

    Yinan LIU  Qingbo WU  Liangzhi TANG  Linfeng XU  

     
    LETTER-Pattern Recognition

      Pubricized:
    2018/02/21
      Vol:
    E101-D No:5
      Page(s):
    1449-1452

    In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.

  • Simple Feature Quantities for Analysis of Periodic Orbits in Dynamic Binary Neural Networks

    Seitaro KOYAMA  Shunsuke AOKI  Toshimichi SAITO  

     
    LETTER-Nonlinear Problems

      Vol:
    E101-A No:4
      Page(s):
    727-730

    A dynamic neural network has ternary connection parameters and can generate various binary periodic orbits. In order to analyze the dynamics, we present two feature quantities which characterize stability and transient phenomenon of a periodic orbit. Calculating the feature quantities, we investigate influence of connection sparsity on stability of a target periodic orbit corresponding to a circuit control signal. As the sparsity increases, at first, stability of a target periodic orbit tends to be stronger. In the next, the stability tends to be weakened and various transient phenomena exist. In the most sparse case, the network has many periodic orbits without transient phenomenon.

  • ECG-Based Heartbeat Classification Using Two-Level Convolutional Neural Network and RR Interval Difference

    Yande XIANG  Jiahui LUO  Taotao ZHU  Sheng WANG  Xiaoyan XIANG  Jianyi MENG  

     
    PAPER-Biological Engineering

      Pubricized:
    2018/01/12
      Vol:
    E101-D No:4
      Page(s):
    1189-1198

    Arrhythmia classification based on electrocardiogram (ECG) is crucial in automatic cardiovascular disease diagnosis. The classification methods used in the current practice largely depend on hand-crafted manual features. However, extracting hand-crafted manual features may introduce significant computational complexity, especially in the transform domains. In this study, an accurate method for patient-specific ECG beat classification is proposed, which adopts morphological features and timing information. As to the morphological features of heartbeat, an attention-based two-level 1-D CNN is incorporated in the proposed method to extract different grained features automatically by focusing on various parts of a heartbeat. As to the timing information, the difference between previous and post RR intervels is computed as a dynamic feature. Both the extracted morphological features and the interval difference are used by multi-layer perceptron (MLP) for classifing ECG signals. In addition, to reduce memory storage of ECG data and denoise to some extent, an adaptive heartbeat normalization technique is adopted which includes amplitude unification, resolution modification, and signal difference. Based on the MIT-BIH arrhythmia database, the proposed classification method achieved sensitivity Sen=93.4% and positive predictivity Ppr=94.9% in ventricular ectopic beat (VEB) detection, sensitivity Sen=86.3% and positive predictivity Ppr=80.0% in supraventricular ectopic beat (SVEB) detection, and overall accuracy OA=97.8% under 6-bit ECG signal resolution. Compared with the state-of-the-art automatic ECG classification methods, these results show that the proposed method acquires comparable accuracy of heartbeat classification though ECG signals are represented by lower resolution.

  • Sequential Convolutional Residual Network for Image Recognition

    Wonjun HWANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/01/18
      Vol:
    E101-D No:4
      Page(s):
    1213-1216

    In this letter, we propose a sequential convolutional residual network, where we first analyze a tangled network architecture using simplified equations and determine the critical point to untangle the complex network architecture. Although the residual network shows good performance, the learning efficiency is not better than expected at deeper layers because the network is excessively intertwined. To solve this problem, we propose a network in which the information is transmitted sequentially. In this network architecture, the neighboring layer output adds the input of the current layer and iteratively passes its result to the next sequential layer. Thus, the proposed network can improve the learning efficiency and performance by successfully mitigating the complexity in deep networks. We show that the proposed network performs well on the Cifar-10 and Cifar-100 datasets. In particular, we prove that the proposed method is superior to the baseline method as the depth increases.

  • Deep Neural Network Based Monaural Speech Enhancement with Low-Rank Analysis and Speech Present Probability

    Wenhua SHI  Xiongwei ZHANG  Xia ZOU  Meng SUN  Wei HAN  Li LI  Gang MIN  

     
    LETTER-Noise and Vibration

      Vol:
    E101-A No:3
      Page(s):
    585-589

    A monaural speech enhancement method combining deep neural network (DNN) with low rank analysis and speech present probability is proposed in this letter. Low rank and sparse analysis is first applied on the noisy speech spectrogram to get the approximate low rank representation of noise. Then a joint feature training strategy for DNN based speech enhancement is presented, which helps the DNN better predict the target speech. To reduce the residual noise in highly overlapping regions and high frequency domain, speech present probability (SPP) weighted post-processing is employed to further improve the quality of the speech enhanced by trained DNN model. Compared with the supervised non-negative matrix factorization (NMF) and the conventional DNN method, the proposed method obtains improved speech enhancement performance under stationary and non-stationary conditions.

  • Corpus Expansion for Neural CWS on Microblog-Oriented Data with λ-Active Learning Approach

    Jing ZHANG  Degen HUANG  Kaiyu HUANG  Zhuang LIU  Fuji REN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/12/08
      Vol:
    E101-D No:3
      Page(s):
    778-785

    Microblog data contains rich information of real-world events with great commercial values, so microblog-oriented natural language processing (NLP) tasks have grabbed considerable attention of researchers. However, the performance of microblog-oriented Chinese Word Segmentation (CWS) based on deep neural networks (DNNs) is still not satisfying. One critical reason is that the existing microblog-oriented training corpus is inadequate to train effective weight matrices for DNNs. In this paper, we propose a novel active learning method to extend the scale of the training corpus for DNNs. However, due to a large amount of partially overlapped sentences in the microblogs, it is difficult to select samples with high annotation values from raw microblogs during the active learning procedure. To select samples with higher annotation values, parameter λ is introduced to control the number of repeatedly selected samples. Meanwhile, various strategies are adopted to measure the overall annotation values of a sample during the active learning procedure. Experiments on the benchmark datasets of NLPCC 2015 show that our λ-active learning method outperforms the baseline system and the state-of-the-art method. Besides, the results also demonstrate that the performances of the DNNs trained on the extended corpus are significantly improved.

  • DNN-Based Speech Synthesis Using Speaker Codes

    Nobukatsu HOJO  Yusuke IJIMA  Hideyuki MIZUNO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2017/11/01
      Vol:
    E101-D No:2
      Page(s):
    462-472

    Deep neural network (DNN)-based speech synthesis can produce more natural synthesized speech than the conventional HMM-based speech synthesis. However, it is not revealed whether the synthesized speech quality can be improved by utilizing a multi-speaker speech corpus. To address this problem, this paper proposes DNN-based speech synthesis using speaker codes as a method to improve the performance of the conventional speaker dependent DNN-based method. In order to model speaker variation in the DNN, the augmented feature (speaker codes) is fed to the hidden layer(s) of the conventional DNN. This paper investigates the effectiveness of introducing speaker codes to DNN acoustic models for speech synthesis for two tasks: multi-speaker modeling and speaker adaptation. For the multi-speaker modeling task, the method we propose trains connection weights of the whole DNN using a multi-speaker speech corpus. When performing multi-speaker synthesis, the speaker code corresponding to the selected target speaker is fed to the DNN to generate the speaker's voice. When performing speaker adaptation, a set of connection weights of the multi-speaker model is re-estimated to generate a new target speaker's voice. We investigated the relationship between the prediction performance and architecture of the DNNs through objective measurements. Objective evaluation experiments revealed that the proposed model outperformed conventional methods (HMMs, speaker dependent DNNs and multi-speaker DNNs based on a shared hidden layer structure). Subjective evaluation experimental results showed that the proposed model again outperformed the conventional methods (HMMs, speaker dependent DNNs), especially when using a small number of target speaker utterances.

  • End-to-End Exposure Fusion Using Convolutional Neural Network

    Jinhua WANG  Weiqiang WANG  Guangmei XU  Hongzhe LIU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/11/22
      Vol:
    E101-D No:2
      Page(s):
    560-563

    In this paper, we describe the direct learning of an end-to-end mapping between under-/over-exposed images and well-exposed images. The mapping is represented as a deep convolutional neural network (CNN) that takes multiple-exposure images as input and outputs a high-quality image. Our CNN has a lightweight structure, yet gives state-of-the-art fusion quality. Furthermore, we know that for a given pixel, the influence of the surrounding pixels gradually increases as the distance decreases. If the only pixels considered are those in the convolution kernel neighborhood, the final result will be affected. To overcome this problem, the size of the convolution kernel is often increased. However, this also increases the complexity of the network (too many parameters) and the training time. In this paper, we present a method in which a number of sub-images of the source image are obtained using the same CNN model, providing more neighborhood information for the convolution operation. Experimental results demonstrate that the proposed method achieves better performance in terms of both objective evaluation and visual quality.

  • Learning Deep Relationship for Object Detection

    Nuo XU  Chunlei HUO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/09/28
      Vol:
    E101-D No:1
      Page(s):
    273-276

    Object detection has been a hot topic of image processing, computer vision and pattern recognition. In recent years, training a model from labeled images using machine learning technique becomes popular. However, the relationship between training samples is usually ignored by existing approaches. To address this problem, a novel approach is proposed, which trains Siamese convolutional neural network on feature pairs and finely tunes the network driven by a small amount of training samples. Since the proposed method considers not only the discriminative information between objects and background, but also the relationship between intraclass features, it outperforms the state-of-arts on real images.

  • Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals

    Tomoki HAYASHI  Masafumi NISHIDA  Norihide KITAOKA  Tomoki TODA  Kazuya TAKEDA  

     
    PAPER-Engineering Acoustics

      Vol:
    E101-A No:1
      Page(s):
    199-210

    In this study, toward the development of smartphone-based monitoring system for life logging, we collect over 1,400 hours of data by recording including both the outdoor and indoor daily activities of 19 subjects, under practical conditions with a smartphone and a small camera. We then construct a huge human activity database which consists of an environmental sound signal, triaxial acceleration signals and manually annotated activity tags. Using our constructed database, we evaluate the activity recognition performance of deep neural networks (DNNs), which have achieved great performance in various fields, and apply DNN-based adaptation techniques to improve the performance with only a small amount of subject-specific training data. We experimentally demonstrate that; 1) the use of multi-modal signal, including environmental sound and triaxial acceleration signals with a DNN is effective for the improvement of activity recognition performance, 2) the DNN can discriminate specified activities from a mixture of ambiguous activities, and 3) DNN-based adaptation methods are effective even if only a small amount of subject-specific training data is available.

261-280hit(855hit)